80 research outputs found

    Emergence of Compositional Representations in Restricted Boltzmann Machines

    Full text link
    Extracting automatically the complex set of features composing real high-dimensional data is crucial for achieving high performance in machine--learning tasks. Restricted Boltzmann Machines (RBM) are empirically known to be efficient for this purpose, and to be able to generate distributed and graded representations of the data. We characterize the structural conditions (sparsity of the weights, low effective temperature, nonlinearities in the activation functions of hidden units, and adaptation of fields maintaining the activity in the visible layer) allowing RBM to operate in such a compositional phase. Evidence is provided by the replica analysis of an adequate statistical ensemble of random RBMs and by RBM trained on the handwritten digits dataset MNIST.Comment: Supplementary material available at the authors' webpag

    Statistical Physics and Representations in Real and Artificial Neural Networks

    Full text link
    This document presents the material of two lectures on statistical physics and neural representations, delivered by one of us (R.M.) at the Fundamental Problems in Statistical Physics XIV summer school in July 2017. In a first part, we consider the neural representations of space (maps) in the hippocampus. We introduce an extension of the Hopfield model, able to store multiple spatial maps as continuous, finite-dimensional attractors. The phase diagram and dynamical properties of the model are analyzed. We then show how spatial representations can be dynamically decoded using an effective Ising model capturing the correlation structure in the neural data, and compare applications to data obtained from hippocampal multi-electrode recordings and by (sub)sampling our attractor model. In a second part, we focus on the problem of learning data representations in machine learning, in particular with artificial neural networks. We start by introducing data representations through some illustrations. We then analyze two important algorithms, Principal Component Analysis and Restricted Boltzmann Machines, with tools from statistical physics

    Computational protein design with evolutionary-based and physics-inspired modeling: current and future synergies

    Full text link
    Computational protein design facilitates discovery of novel proteins with prescribed structure and functionality. Exciting designs were recently reported using novel data-driven methodologies that can be roughly divided into two categories: evolutionary-based and physics-inspired approaches. The former infer characteristic sequence features shared by sets of evolutionary-related proteins, such as conserved or coevolving positions, and recombine them to generate candidates with similar structure and function. The latter estimate key biochemical properties such as structure free energy, conformational entropy or binding affinities using machine learning surrogates, and optimize them to yield improved designs. Here, we review recent progress along both tracks, discuss their strengths and weaknesses, and highlight opportunities for synergistic approaches

    Staphylococcus aureus infective endocarditis versus bacteremia strains: Subtle genetic differences at stake

    Get PDF
    AbstractInfective endocarditis (IE)(1) is a severe condition complicating 10–25% of Staphylococcus aureus bacteremia. Although host-related IE risk factors have been identified, the involvement of bacterial features in IE complication is still unclear. We characterized strictly defined IE and bacteremia isolates and searched for discriminant features. S. aureus isolates causing community-acquired, definite native-valve IE (n=72) and bacteremia (n=54) were collected prospectively as part of a French multicenter cohort. Phenotypic traits previously reported or hypothesized to be involved in staphylococcal IE pathogenesis were tested. In parallel, the genotypic profiles of all isolates, obtained by microarray, were analyzed by discriminant analysis of principal components (DAPC)(2). No significant difference was observed between IE and bacteremia strains, regarding either phenotypic or genotypic univariate analyses. However, the multivariate statistical tool DAPC, applied on microarray data, segregated IE and bacteremia isolates: IE isolates were correctly reassigned as such in 80.6% of the cases (C-statistic 0.83, P<0.001). The performance of this model was confirmed with an independent French collection IE and bacteremia isolates (78.8% reassignment, C-statistic 0.65, P<0.01). Finally, a simple linear discriminant function based on a subset of 8 genetic markers retained valuable performance both in study collection (86.1%, P<0.001) and in the independent validation collection (81.8%, P<0.01). We here show that community-acquired IE and bacteremia S. aureus isolates are genetically distinct based on subtle combinations of genetic markers. This finding provides the proof of concept that bacterial characteristics may contribute to the occurrence of IE in patients with S. aureus bacteremia

    COVID-19 symptoms at hospital admission vary with age and sex: results from the ISARIC prospective multinational observational study

    Get PDF
    Background: The ISARIC prospective multinational observational study is the largest cohort of hospitalized patients with COVID-19. We present relationships of age, sex, and nationality to presenting symptoms. Methods: International, prospective observational study of 60 109 hospitalized symptomatic patients with laboratory-confirmed COVID-19 recruited from 43 countries between 30 January and 3 August 2020. Logistic regression was performed to evaluate relationships of age and sex to published COVID-19 case definitions and the most commonly reported symptoms. Results: ‘Typical’ symptoms of fever (69%), cough (68%) and shortness of breath (66%) were the most commonly reported. 92% of patients experienced at least one of these. Prevalence of typical symptoms was greatest in 30- to 60-year-olds (respectively 80, 79, 69%; at least one 95%). They were reported less frequently in children (≀ 18 years: 69, 48, 23; 85%), older adults (≄ 70 years: 61, 62, 65; 90%), and women (66, 66, 64; 90%; vs. men 71, 70, 67; 93%, each P &lt; 0.001). The most common atypical presentations under 60 years of age were nausea and vomiting and abdominal pain, and over 60 years was confusion. Regression models showed significant differences in symptoms with sex, age and country. Interpretation: This international collaboration has allowed us to report reliable symptom data from the largest cohort of patients admitted to hospital with COVID-19. Adults over 60 and children admitted to hospital with COVID-19 are less likely to present with typical symptoms. Nausea and vomiting are common atypical presentations under 30 years. Confusion is a frequent atypical presentation of COVID-19 in adults over 60 years. Women are less likely to experience typical symptoms than men

    Relations entre faune sauvage et Ă©leveurs au Sahara

    No full text
    Introduction Au sud du dĂ©sert du TĂ©nĂ©rĂ©, le massif de Termit forme une ligne de collines rocheuses dĂ©chiquetĂ©es d'Ă  peine 700 m de haut qui s'avance entre les dunes sur une centaine de kilomĂštres. Au sud commence la plaine steppique de l’Ayer (carte 1). La pluviomĂ©trie est d’environ 200 mm par an au village de Tasker, Ă  une centaine de kilomĂštres au sud-ouest de Termit. Elle est sans doute moindre sur le massif, ce qui le place clairement dans la zone saharienne. Le massif et les dunes qui l'..

    Machines de Boltzmann restreintes : des représentations compositionnelles à l'analyse des séquences de protéines

    No full text
    Restricted Boltzmann machines (RBM) are graphical models that learn jointly a probability distribution and a representation of data. Despite their simple architecture, they can learn very well complex data distributions such the handwritten digits data base MNIST. Moreover, they are empirically known to learn compositional representations of data, i.e. representations that effectively decompose configurations into their constitutive parts. However, not all variants of RBM perform equally well, and little theoretical arguments exist for these empirical observations. In the first part of this thesis, we ask how come such a simple model can learn such complex probability distributions and representations. By analyzing an ensemble of RBM with random weights using the replica method, we have characterised a compositional regime for RBM, and shown under which conditions (statistics of weights, choice of transfer function) it can and cannot arise. Both qualitative and quantitative predictions obtained with our theoretical analysis are in agreement with observations from RBM trained on real data. In a second part, we present an application of RBM to protein sequence analysis and design. Owe to their large size, it is very difficult to run physical simulations of proteins, and to predict their structure and function. It is however possible to infer information about a protein structure from the way its sequence varies across organisms. For instance, Boltzmann Machines can leverage correlations of mutations to predict spatial proximity of the sequence amino-acids. Here, we have shown on several synthetic and real protein families that provided a compositional regime is enforced, RBM can go beyond structure and extract extended motifs of coevolving amino-acids that reflect phylogenic, structural and functional constraints within proteins. Moreover, RBM can be used to design new protein sequences with putative functional properties by recombining these motifs at will. Lastly, we have designed new training algorithms and model parametrizations that significantly improve RBM generative performance, to the point where it can compete with state-of-the-art generative models such as Generative Adversarial Networks or Variational Autoencoders on medium-scale data.Les Machines de Boltzmann restreintes (RBM) sont des modĂšles graphiques capables d’apprendre simultanĂ©ment une distribution de probabilitĂ© et une reprĂ©sentation des donnĂ©es. MalgrĂ© leur architecture relativement simple, les RBM peuvent reproduire trĂšs fidĂšlement des donnĂ©es complexes telles que la base de donnĂ©es de chiffres Ă©crits Ă  la main MNIST. Il a par ailleurs Ă©tĂ© montrĂ© empiriquement qu’elles peuvent produire des reprĂ©sentations compositionnelles des donnĂ©es, i.e. qui dĂ©composent les configurations en leurs diffĂ©rentes parties constitutives. Cependant, toutes les variantes de ce modĂšle ne sont pas aussi performantes les unes que les autres, et il n’y a pas d’explication thĂ©orique justifiant ces observations empiriques. Dans la premiĂšre partie de ma thĂšse, nous avons cherchĂ© Ă  comprendre comment un modĂšle si simple peut produire des distributions de probabilitĂ© si complexes. Pour cela, nous avons analysĂ© un modĂšle simplifiĂ© de RBM Ă  poids alĂ©atoires Ă  l’aide de la mĂ©thode des rĂ©pliques. Nous avons pu caractĂ©riser thĂ©oriquement un rĂ©gime compositionnel pour les RBM, et montrĂ© sous quelles conditions (statistique des poids, choix de la fonction de transfert) ce rĂ©gime peut ou ne peut pas Ă©merger. Les prĂ©dictions qualitatives et quantitatives de cette analyse thĂ©orique sont en accord avec les observations rĂ©alisĂ©es sur des RBM entraĂźnĂ©es sur des donnĂ©es rĂ©elles. Nous avons ensuite appliquĂ© les RBM Ă  l’analyse et Ă  la conception de sĂ©quences de protĂ©ines. De part leur grande taille, il est en effet trĂšs difficile de simuler physiquement les protĂ©ines, et donc de prĂ©dire leur structure et leur fonction. Il est cependant possible d’obtenir des informations sur la structure d’une protĂ©ine en Ă©tudiant la façon dont sa sĂ©quence varie selon les organismes. Par exemple, deux sites prĂ©sentant des corrĂ©lations de mutations importantes sont souvent physiquement proches sur la structure. A l’aide de modĂšles graphiques tels que les Machine de Boltzmann, on peut exploiter ces signaux pour prĂ©dire la proximitĂ© spatiale des acides-aminĂ©s d’une sĂ©quence. Dans le mĂȘme esprit, nous avons montrĂ© sur plusieurs familles de protĂ©ines que les RBM peuvent aller au-delĂ  de la structure, et extraire des motifs Ă©tendus d’acides aminĂ©s en coĂ©volution qui reflĂštent les contraintes phylogĂ©nĂ©tiques, structurelles et fonctionnelles des protĂ©ines. De plus, on peut utiliser les RBM pour concevoir de nouvelles sĂ©quences avec des propriĂ©tĂ©s fonctionnelles putatives par recombinaison de ces motifs. Enfin, nous avons dĂ©veloppĂ© de nouveaux algorithmes d’entraĂźnement et des nouvelles formes paramĂ©triques qui amĂ©liorent significativement la performance gĂ©nĂ©rative des RBM. Ces amĂ©liorations les rendent compĂ©titives avec l’état de l’art des modĂšles gĂ©nĂ©ratifs tels que les rĂ©seaux gĂ©nĂ©ratifs adversariaux ou les auto-encodeurs variationnels pour des donnĂ©es de taille intermĂ©diaires

    Au Tchad, un second Darfour ?

    No full text

    Civil society in Darfur : the missing peace

    No full text
    Theodore Murphy; JĂ©rĂŽme Tubian

    Learning protein constitutive motifs from sequence data

    No full text
    International audienc
    • 

    corecore